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1.  INTRODUCTION 

There  has  recently  been  a  revival  of  interest  in  Greenwood's  (1946) 
statistic,  labeled  G  below,  and  its  use  in  testing  for  uniformity  of 
points  on  a  line.  Burrows  (1979)  has  published  percentage  points 
for  G  for  this  test,  for  samples  of  size  n  up  to  10.  In  this 
paper  we  extend  Burrows'  tables,  and  add  some  brief  comments  on 
recent  works  on  the  effectiveness  of  G  and  related  statistics  as 
test  statistics  for  uniformity. 

Greenwood's  statistic  G  is  calculated  from  a  sample  of  n 
values  in  the  Interval  (0,1)  as  follows.  Let  xi»x2,,*',xn  be 
the  sample  values,  in  ascending  order,  and  define  d_^  to  be  the 
spacing  d^  *  xi“xi-l*  *  *  2,...,n;  let  d^  ■  x^,  and  dn+^  *  1-x^. 
Greenwood's  statistic  is  then 


G 


n+1 

l 

i-1 


Suppose  is  the  null  hypothesis  that  the  xt  are,  before  being 

ordered,  a  random  sample  from  the  uniform  distribution  with  limits  0, 
and  1.  The  statistic  G  is  a  natural  statistic  for  testing  H^, 
however,  the  distribution  of  G,  for  Hq,  is  difficult  to  find,  even 
for  small  samples. 
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Moran  (1947)  gives  many  results  on  G,  including  the  moments  for  HQ, 
and  both  Moran  and  Burrows  (1979)  have  references  to  early  work  on  the 
distribution  theory  of  G.  Moran  showed  by  means  of  the  moments  that 
G  has  an  asymptotic  normal  distribution,  but  the  rate  of  convergence 
to  normality  is  so  slow  that  this  result  cannot  be  used  to  give  percentage 
points  for  test  purposes.  Burrows  gives  points  for  n  £  10,  and  these  are 
derived  from  a  recursion  relation.  To  these  we  now  add,  in  Table  1 
points  for  n  >  10,  for  nG  rather  than  G.  These  were  found  by 
fitting  Pearson  curves  to  the  moments.  This  technique  will  give 
very  accurate  results  (Solomon  and  Stephens,  1978),  and  we  verify 
this  by  comparing  points  for  n  =  10  with  those  of  Burrows  (1979). 

The  statistic  nG  is  tabulated  rather  than  G  to  make  interpolation 
easier. 

Hill  (1979)  has  given  an  algorithm  also  using  four  moments,  to 

approximate  the  percentage  points  by  Johnson  curves.  He  compares 

the  approximate  points  for  n  ■  10  with  Burrows'  exact  points  and 

finds  good  agreement.  Unfortunately  the  points  given  by  Hill  are 

incorrect;  the  algorithm  reproduces  a  misprint  in  the  moments  given 

by  Moran  (1947).  The  third  moment  about  the  mean,  y^,  has  numerator 

2 

8n(10n-4)  and  not  8(10n  -4)  as  given  by  Moran  and  used  by  Hill. 

The  error  and  correction  have  been  recently  confirmed  by  a  private 

* 

communication  from  Professor  Moran  and  of  course  it  is  easy  to  correct 

*  This  was  also  pointed  out  by  Hartley  and  Pfaf f enberger  (1972). 
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the  Hill  algorithm.  When  corrected,  the  plot  of  8.,  follows  a 

curve  somewhat  like  that  drawn  by  Hill,  but  with  smaller  values  of 
8^;  some  values  are  given  in  Table  2.  The  Johnson  curves,  as  before, 
will  change  character  near  n  ■  12,  and  the  reservations  expressed  by 
Hill  would  still  apply. 

Hill  also  mentions  the  fact  that  use  of  the  lower  endpoint 
of  the  distribution  (the  smallest  value  of  G  is  l/(n  +  l),  when  all 
d^  *  l/(n  +  l))  gave  worse  results  than  using  four  moments  only.  This 
difficulty  might  well  disappear  with  the  correction.  In  our  experience, 
use  of  three  moments  and  the  lower  endpoint  gave  improved  values  in  the 
lower  tail,  but  worse  in  the  upper  tail;  this  would  be  expected,  since 
the  lower  tail  is  very  steep  for  small  n  (see  Burrows'  Figure  1)  and 
use  of  the  correct  endpoint  will  be  a  great  advantage.  However,  as  n 
increases  the  difference  in  percentage  points  obtained  by  the  two  methods 
is  smaller;  more  importantly,  the  upper  tail  points  are  those  most  likely 
to  be  needed  in  practice,  so  for  Table  1  it  was  decided  to  use  a  four 
moment  fit.  This  fit,  for  n  *  10,  gives  lower  and  upper  1%  and  5% 
points  for  10G  equal  to  1.154,  1.222,  2.412  and  2.997  to  be  compared 
with  Burrows'  exact  values  of  1.117,  1.211,  2.404,  and  3.008.  Except 
for  the  lower  12  point,  there  will  be  negligible  error  in  significance 
level  obtained  by  using  the  Pearson  curve  points  for  this  value  of  n, 
and  the  accuracy  can  be  expected  to  Improve  as  n  becomes  larger.  Using 
three  moments  and  the  lower  endpoint  the  points  are  1.116,  1.207,  2.399, 
2.987.  (Incidentally,  the  points  given  by  Hill  for  n  *  10  are  marginally 
better  than  the  Pearson  curve  4-moment  fit,  thus  demonstrating  that  it  is 
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sometimes  better  to  be  approximately  wrong  than  to  be  approximately 
right.  For  n  ■  5,  however,  the  Pearson  curve  points  are  definitely 
better) . 

It  will  be  interesting  to  see  a  comparison  of  Johnson  curve  values 
with  the  Pearson  curve  values;  previous  experience  has  often  shown  that 
they  are  very  close  together.  However,  the  author  does  not  have  the 
Johnson  curve  algorithm,  and  has  in  fact  had  the  Pearson  curve  algorithm 
for  several  years  (they  have  been  used  in  comparing  various  goodness-of- 
fit  procedures)  so  it  seems  worthwhile  to  present  them  now.  The  algorithm 
used  is  being  prepared  for  publication.  It  involves  interpolation  in 
the  extensive  tables  of  Pearson  curves  given  in  Biometrlka  Tables,  Vol.  2. 
Computer  routines  are  also  available  (Solomon  and  Stephens,  1978). 

2.  COMMENTS  ON  GOODNESS-OF-FIT 

Over  the  years,  there  has  been  a  steady  Interest  in  statistics 

based  on  spacings  in  general,  and  Pyke  (1965)  gives  a  very  full 

discussion  of  work  to  that  date.  More  recently,  there  has  developed 

new  interest  in  G  itself  or  statistics  similar  to  G.  Hartley  and 

2 

Pfaff enberger  (1972)  introduced  a  statistic  S  which  is  related  to 

o 

G  by  S  ■  {(n+l)G-l)(n+2),  so  that  it  is  effectively  the  same  as  G 

for  testing  purposes.  The  author,  in  an  unpublished  Technical  Report 

2 

(Stephens,  1974)  made  some  comparisons  of  S  with  EOF  (empirical 

2  2  2 

distribution  function)  statistics  D,  W  ,  U  and  A  ,  in  tests  for 

2 

uniformity  and  concluded  that  S  (i.e.,  G)  has  power  somewhere 
2  2 

between  U  and  A  ,  depending  on  the  alternative.  The  points  for 


2 

S  were  derived  from  the  points  now  given  for  G.  Further  comparisons 
have  been  made  by  Quesenberry  and  Miller  (1977)  and  the  author  has  also 
continued  a  study  of  tests  for  uniformity,  the  results  to  be  published 
elsewhere.  Hartely  and  Pfaffenberger  (1972)  and  del  Pino  (1978)  have 
also  discussed  k-spacings,  i.e.  spaclngs  between  the  ordered  x^, 
taken  k  at  a  time.  Pyke  (1965)  refers  to  doubts  that  spacings 
provide  effective  statistics  for  tests  of  uniformity,  but  these 
issues  are  not  yet  clear;  see  del  Pino  (1978)  for  most  recent  work 
on  these  lines,  and  for  other  references.  Spacings  have  a  natural 
appeal,  for  example,  when  they  arise  as  the  intervals  between  events 
formed  by  a  renewal  process,  and  it  is  hoped  that  provision  of  percen¬ 
tage  points  for  G  or  for  nG  will  encourage  further  work  on  the 
properties  of  this  statistic. 
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TABLE  1 


Upper  and  lower  percentage  points  for  nG 


Sample  t — 

size  ! 

Lower 


n 

.01 

.025 

12 

1.198 

1.234 

14 

1.233 

1.272 

16 

1.263 

1.304 

18 

1.288 

1.332 

20 

1.311 

1.356 

25 

1.358 

1.405 

30 

1.395 

1.444 

40 

1.453 

1.502 

50 

1.495 

1.544 

60 

1.529 

1.577 

80 

1.579 

1.625 

100 

1.616 

1.659 

200 

1.714 

1.750 

500 

1.811 

1.836 

Percentage  level 


tall 


.05 

.10 

.10 

1.272 

1.326 

2.204 

1.312 

1.368 

2.227 

1.346 

1.403 

2.242 

1.375 

1.433 

2.251 

1.400 

1.459 

2.258 

1.451 

1.510 

2.265 

1.490 

1.549 

2.265 

1.548 

1.605 

2.258 

1.589 

1.644 

2.248 

1.621 

1.674 

2.238 

1.666 

1.716 

2.220 

1.698 

1.745 

2.205 

1.781 

1.818 

2.159 

1.858 

1.884 

2.107 

Upper 

tall 

.05 

.025 

.01 

2.441 

2.683 

3.015 

2.457 

2.691 

3.014 

2.464 

2.691 

3.003 

2.466 

2.685 

2.988 

2.465 

2.677 

2.970 

2.456 

2.651 

2.920 

2.443 

2.624 

2.873 

2.415 

2.573 

2.790 

2.389 

2.531 

2.723 

2.367 

2.495 

2.669 

2.331 

2.441 

2.587 

2.304 

2.400 

2.528 

2.226 

2.289 

2.371 

2.147 

2.183 

2.228 
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TABLE  3 


Comparisons  of  exact  points  for  G  with  various  approximations. 

PC(4)  and  PC(3)  refer  to  Pearson  curve  approximations 
using  4  moments  or  3  moments  and  the  lower  end  point. 


n  =  5 

a: 

.01 

.05 

Exact 

.1839 

.1994 

Hill 

.1988 

.2074 

PC(4) 

.1942 

.2026 

PC(3) 

.1837 

.1979 

10 

.90 

.95 

.99 

2101 

.3830 

.4320 

.5475 

2144 

.3860 

.4377 

.5505 

2104 

.3856 

.4330 

.5401 

2085 

.3831 

.4292 

.5381 

n  =  10 

Exact 

.1116 

.1211 

Hill 

.1138 

.1220 

PC  (4) 

.1154 

.1222 

PC<3) 

.1116 

.1207 

1272 

.2157 

.2404 

.3008 

1276 

.2161 

.2406 

.3007 

1274 

.2168 

.2412 

.2997 

1268 

.2160 

.2399 

.2987 
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